# **CSE332: Problems & Solutions: Lecture 7**

Define Clock Signal, Clock Frequency, Clock Period, Cycles Per Instruction (CPI) and Average Cycles Per Instruction

List the factors that affect Instruction count, CPI and clock frequency?

### What is **CPU clock signal?**

A highly precise square-wave like electronic signal used by CPU to sequence and synchronize its operations, called clock signal.



Frequency of this timing signal, represented in MHz/GHz, is used as one of the performance measures of a processor as well as a computer. Frequency of the clock signal depends on transistor, the fundamental element of CPU design, semiconductor & VLSI technology used to fabricate transistor and other functional units of CPU.

What is **CPU clock signal? Briefly discuss**.

Each CPU/Microprocessor requires a highly precise square wave like electronic signal as input, called Clock signal

This signal is often generated by an external IC and input to processor

Operation of CPU and computer hardware are strictly controlled by this signal.

CPU/Computers use this electronic signal to determine when events take place within hardware

Basic operations and execution of any instruction are controlled by this signal.

Run time of any instruction or of a program also depend on this electronic signal

If time period of clock signal, T = 1 ns, find the Clock Frequency.

Clock frequency,  $f = 1/10^{-9}$  sec =  $10^{9}$  Hz = 1GHz

A processor uses 2GHz clock signal. If the processor takes 4 ns for a multiplication operation,

calculate CPI for multiplication.

Clock frequency: 2 GHz

Clock period: 1/2GHz = 1/2 x  $10^9$  sec = (1/2) x  $10^{-9}$  sec = 0.5 ns

CPI for multiplication =  $\frac{4 \text{ ns}}{0.5 \text{ ns}} = \frac{8}{2}$ 

# What do you understand by 3GHz clock signal?

3 GHz clock signal means, the signal completes 3 x 10<sup>9</sup> full cycles in one second.

Moreover, the time period of the signal,  $T = 1/(3x10^9)$  seconds =  $(1/3) \times 10^{-9}$  sec = 1/3 ns = 0.33 ns

Consider two different machines, with two different instruction sets, using two compilers runs a program having following distribution of instructions.

| Instruction | Machine-A (Clock rate 250MHz) |   | Machine-B (Clock rate 300MHz) |     |
|-------------|-------------------------------|---|-------------------------------|-----|
| Type        | Instruction Count in CPI      |   | Instruction Count in Millions | CPI |
|             | Millions                      |   |                               |     |
| ARITHMETIC  | 8                             | 1 | 7                             | 1   |
| LOAD        | 6                             | 5 | 8                             | 4   |
| BRANCH      | 2                             | 4 | 2                             | 5   |
| STORE       | 4                             | 2 | 3                             | 3   |

Determine the average CPI.

Solution:

Machine-A

Average CPI:  $(8\times1+6\times5+2\times4+4\times2)/20 = 54/20 = 2.7$ 

Machine-B

Average CPI:  $(7 \times 1 + 8 \times 4 + 2 \times 5 + 3 \times 3)/20 = 58/20 = 2.9$ 

A benchmark program is run on a 40 MHz processor. The executed program consists of 100,000 instruction executions, with the following instruction mix and clock cycle count:

| INSTRUCTION TYPE | INSTRUCTION COUNT | CYCLES PER   |
|------------------|-------------------|--------------|
|                  |                   | INSTRUCTIONS |
| ARITHMETIC       | 45000             | 1            |
| DATA TRANSFER    | 32000             | 2            |
| FLOATING POINT   | 15000             | 2            |
| CONTROL TRANSFER | 8000              | 2            |

Determine the Average CPI for this program.

Solution:

Average CPI =  $(45000 \times 1 + 32000 \times 2 + 15000 \times 2 + 8000 \times 2)/100000 = 1.55$ 

Consider two different machines, with two different instruction sets, both of which have a clock rate of 200 MHz. The following measurements are recorded on the two machines running a given set of benchmark programs:

| MACHINE - | INSTRUCTION TYPE | INSTRUCTION | CYCLES PER   |
|-----------|------------------|-------------|--------------|
| A         |                  | COUNT       | INSTRUCTIONS |
|           | ARITHMETIC       | 8           | 1            |
|           | LOAD and STORE   | 4           | 3            |
|           | BRANCH           | 2           | 4            |
|           | OTHERS           | 4           | 3            |

. . . . . . . . . . . . . . . . . . .

| MACHINE - | INSTRUCTION TYPE | INSTRUCTION | CYCLES PER   |
|-----------|------------------|-------------|--------------|
| В         |                  | COUNT       | INSTRUCTIONS |
|           | ARITHMETIC       | 10          | 1            |
|           | LOAD and STORE   | 8           | 2            |
|           | BRANCH           | 2           | 4            |
|           | OTHERS           | 4           | 3            |

Determine the Average CPI for each machine.

Machine-A: Average CPI:  $(8\times1+4\times3+2\times4+4\times3)/18 = 40/18 = 2.22$ 

Machine-B: Average CPI:  $(10\times1+8\times2+2\times4+4\times3)/24 = 46/24 = 1.92$ 

2. Consider two different implementations of the same instruction set architecture. The instructions can be divided into four classes according to their CPI (class A, B, C, and D). P1 with a clock rate of 2.5 GHz and CPIs of 1, 2, 3, and 3, and P2 with a clock rate of 3 GHz and CPIs of 2, 2, 2, and 2. Given a program with an instruction count of 1.0x 10<sup>6</sup> instructions divided into classes as follows: 10% class A, 20% class B, 50% class C, and 20% class D, which implementation is faster?

What is the global CPI for each implementation?

Consider two different implementations of the same instruction set architecture. There are four classes of instructions, A, B, C, and D. The clock rate and CPI of each implementation are given in the following table.

|    | CLOCK   | CPI class A | CPI class B | CPI class C | CPI class D |
|----|---------|-------------|-------------|-------------|-------------|
|    | RATE    |             |             |             |             |
| P1 | 2.5 GHz | 1           | 2           | 3           | 3           |

| P2 | 3 GHz | 2 | 2 | 2 | 2 |  |
|----|-------|---|---|---|---|--|
|    |       |   |   |   |   |  |

- a) Given a program with 106 instructions divided into classes as follows: 10% class A, 20% class B, 50% class C, and 20% class D, which implementation is faster?
- b) What is the global CPI for each implementation?
- c) Find the clock cycles required in both cases.

#### Solution:

Class A: 10%

Class B: 20%

Class C: 50%

Class D: 20%

For P1:  $CPI = 0.1 \times 1 + 0.2 \times 2 + 0.5 \times 3 + 0.2 \times 3 = 2.6$ 

Execution Time:  $106 \times 2.6 \times (1/2.5 \times 10^9)$  second =  $275.6 \times 0.4 \times 10^{-9}$  sec =  $110.24 \times 10^{-9}$  sec

Clock cycles:  $2.6 \times 106 = 275.6$ 

For P2: CPI =  $0.1 \times 2 + 0.2 \times 2 + 0.5 \times 2 + 0.2 \times 2 = 2.0$ 

Execution Time:  $106 \times 2.0 \times (1/3 \times 10^9)$  second =  $212 \times 0.33 \times 10^{-9}$  sec =  $69.96 \times 10^{-9}$  sec

Clock cycles:  $2.0 \times 106 = 212$ 

a) P2 implementation is faster

b) For P1: Global CPI = 2.6

For P2: Global CPI = 2.0

c) For P1: Clock cycles: 275.6

For P2: Clock cycles: 212

A RISC machine uses a clock of 3GHz. It runs a program containing  $1.5 \times 10^6$  instructions and the program contains following instruction mix:

| Instruction type | СРІ | Frequency of Instructions in the program |
|------------------|-----|------------------------------------------|
| ALU              | 3   | 20%                                      |
| Load             | 5   | 30%                                      |
| Store            | 5   | 35%                                      |
| Branch           | 6   | 15 <mark>%</mark>                        |
|                  |     |                                          |

---

If a CPU design enhancement reduces CPI values of load and branch instructions by 2 and 3 respectively. What is the resulting performance improvement from this enhancement?

The following table shows the number of instructions for a program.

|    | ARITH | STORE | LOAD | BRANCH | TOTAL |
|----|-------|-------|------|--------|-------|
| A. | 650   | 100   | 600  | 50     | 1400  |
| B. | 750   | 250   | 500  | 500    | 2000  |

- a) Assuming that arithmetic instructions take 1 cycle, load and store 5 cycles, and branches 2 cycles, what is the execution time of the program in a 2 GHz processor?
- b) Find the CPI for the program.

#### Solution:

|    | ARITH | STORE | LOAD  | BRANCH | TOTAL |
|----|-------|-------|-------|--------|-------|
| A. | 650   | 100   | 600   | 50     | 1400  |
| B. | 750   | 250   | 500   | 500    | 2000  |
|    | CPI=1 | CPI=5 | CPI=5 | CPI=2  |       |

## For A:

Average CPI for A:  $(650 \times 1 + 100 \times 5 + 600 \times 5 + 50 \times 2)/1400 = 4200/1400 = 3$ 

If the number of <u>load instructions</u> can be reduced by one half,

Average CPI for A:  $(650 \times 1 + 100 \times 5 + 300 \times 5 + 50 \times 2)/1400 = 2700/1100 = 2.45$ 

For B:

Average CPI for B:  $(750 \times 1 + 250 \times 5 + 500 \times 5 + 500 \times 2)/2000 = 5500/2000 = 2.75$ 

If the number of load instructions can be reduced by one half,

Average CPI for B:  $(750 \times 1 + 250 \times 5 + 250 \times 5 + 500 \times 2)/1750 = 4250/1750 = 2.42$ 

You are on the design team for a new processor. The clock of the processor runs at 200 MHz. The following table gives instruction frequencies for Benchmark program, as well as how many cycles the instructions take, for the different classes of instructions. For this problem, we assume that (unlike many of today's computers) the processor only executes one instruction at a time.

| Instruction Type        | Frequency | СРІ |
|-------------------------|-----------|-----|
| Loads & Stores          | 30%       | 6   |
| Arithmetic Instructions | 50%       | 4   |
| All Others              | 20%       | 3   |

a) Calculate the CPI for Benchmark program.

- b) The hardware expert says that if you double the number of registers, the cycle time must be increased by 20%. What would the new clock speed be (in MHz)?
- c) The compiler expert says that if you double the number of registers, then the compiler will generate code that requires only half the number of Loads & Stores. What would the new CPI be on the benchmark?
- d) How many CPU seconds will the benchmark take if we double the number of registers (taking into account both changes described above)?

#### Solution:

- a) CPI:  $0.3 \times 6 + 0.5 \times 4 + 0.2 \times 3 = 4.4$
- b) MIPS =  $(200 \times 10^6)/(4.4 \times 10^6) = 45.45$
- c) Old Cycle time :  $\frac{1}{(200 \times 10^6)} = \frac{0.005 \times 10^{-6} \text{sec}}{10^{-6} \text{sec}}$

New Cycle time:  $0.005 \times 10^{-6} \text{sec} + 0.001 \times 10^{-6} \text{sec} = 0.006 \times 10^{-6} \text{sec}$ 

New clock speed,  $f = \frac{1}{(0.006 \times 10^{-6} \text{sec})}$  Hz =  $166.66 \times 10^{6}$  Hz = 166.66 MHz

d) Let us assume that, initially total number of instructions was = 100 (Loads & Stores = 30,

Arithmetic = 50 and Others = 20).

Half the number of Loads & Stores = 30/2 = 15

New CPI:  $(15 \times 6 + 50 \times 4 + 20 \times 3)/85 = (90 + 200 + 60)/85 = 350/85 = 4.11$ 

e) Taking c (New clock speed = 166.66 MHz and d (New CPI = 4.11) into consideration,

Execution time:  $85 \times 4.11 \times 1/(166.66 \times 10^6)$  sec =  $2.10 \times 10^{-6}$  sec

A computer  $M_2$  has the following CPIs for instruction types A thru D, and a program  $P_3$  has the following mix of instructions.

 $M_2$ : Type A CPI<sub>A</sub> = 1.7 Type B CPI<sub>B</sub> = 2.1 Type C CPI<sub>C</sub> = 2.7 Type D CPI<sub>D</sub> = 2.4

 $P_3$ : Type A = 22% Type B = 29% Type C = 17% Type D = remaining %

a) Calculate the average CPI of Machine M<sub>2</sub>

#### Solution:

a) Average CPI:  $0.22 \times 1.7 + 0.29 \times 2.1 + 0.17 \times 2.7 + 0.32 \times 2.4 = 0.374 + 0.609 + 0.459 + 0.768 = 2.21$ 

| Instruction Class | Machine M1               | Machine M2               | % of Instructions |  |
|-------------------|--------------------------|--------------------------|-------------------|--|
|                   | Cycles/Instruction class | Cycles/Instruction class |                   |  |

Consider two different implementations, M1 and M2, of the same instruction set. There are three classes of instructions (A, B, and C) in the instruction set. M1 has a clock rate of 80 MHz and M2 has a clock rate of 100 MHz. The average number of cycles for each instruction class and their frequencies (for a typical program) are as follows:

| A | 1 | 2 | 40% |
|---|---|---|-----|
| В | 2 | 4 | 30% |
| С | 4 | 3 | 30% |

Calculate the average CPI for each machine, M1, and M2.

Consider two different implementations, M1 and M2, of the same instruction set. There are three classes of instructions (A, B, and C) in the instruction set. M1 has a clock rate of 80 MHz and M2 has a clock rate of 100 MHz. The average number of cycles for each instruction class and their frequencies (for a typical program) are as follows:

| Instruction Class | Machine M1               | Machine M2               | % of Instructions |
|-------------------|--------------------------|--------------------------|-------------------|
|                   | Cycles/Instruction class | Cycles/Instruction class |                   |
| A                 | 1                        | 2                        | 40%               |
| В                 | 2                        | 4                        | 30%               |
| С                 | 4                        | 3                        | 30%               |

Calculate the average CPI for each machine, M1, and M2.

Solution:

Machine: M1:

Average CPI = 0.4x1+0.3x2+0.3x4 = 2.2

Machine: M2:

Average CPI = 0.4x2+0.3x4+0.3x3 = 2.9